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Abstract. We propose a method for inferring parameterized regular types for 
logic programs as solutions for systems of constraints over sets of finite ground 
Herbrand terms (set constraint systems). Such parameterized regular types gen- 
eralize parametric regular types by extending the scope of the parameters in the 
type definitions so that such parameters can relate the types of different predi- 
cates. We propose a number of enhancements to the procedure for solving the 
constraint systems that improve the precision of the type descriptions inferred. 
The resulting algorithm, together with a procedure to establish a set constraint 
system from a logic program, yields a program analysis that infers tighter safe 
approximations of the success types of the program than previous comparable 
work, offering a new and useful efficiency vs. precision trade-off. This is sup- 
ported by experimental results, which show the feasibility of our analysis. 



1 Introduction 

Type inference of logic programs is the problem of computing, at compile time, a repre- 
sentation of the terms that the predicate arguments will be bound to during execution of 
the program. This kind of type inference involves not only assigning types to procedure 
arguments out of a predefined set of type definitions, as in traditional type inference, 
but also the more complex problem of inferring the type definitions themselves, simi- 
larly to what is done in shape analysis. Although most logic programming languages 
are either untyped or allow mixing typed and untyped code, inferring type information 
for the entire program is important since it allows the compiler to generate more ef- 
ficient code and it has well-known advantages in detecting programming errors early. 
For instance, simple uses of such information include better indexing, specialized uni- 
fication, and more efficient code generation. More advanced uses include compile-time 
garbage collection and non-termination detection. There are also other areas in which 
type information can be useful. For example, during verification and debugging it can 
provide information to the programmer that is not straightforward to obtain by manual 
inspection of the program. 

In this paper we use the set constraint-based approach 0141151 . We propose an al- 
gorithm for solving a set constraint system that relates the set of possible values of the 
program variables, by transforming it into a system whose solutions provide the type 
definitions for the variables involved in parameterized regular form. We focus on types 



(a) Program: 

:- typing append (A1,A2, A3) . 

append ( [ ] , L, L) . 
append( [X|Xs] , Ys, [X|Zs] ) :- 
append (Xs, Ys, Zs) . 


( 


b) Parametric regular types: 

- type Al (X) -> [] | [X|A1 (X) ] . 

- type A2 (W) -> W. 

- type A3(Y,Z) -> Z | [Y|A3(Y,Z)]. 


( 


c) Parameterized regular types: 

- type Al -> [] | [X|A1] . 

- type A2 -> . 

- type A3 -> A2 | [X|A3] . 


(d) Local parameters: 

:- typing append (Al (E) , A2 (T) , A3 (E, T) ) . 
:- typing nrev (Nl (E) , N2 (E) ) , 

append(Al (E) , A2 ( [ ] ) , A3 (E, [ ] ) ) . 


(e) Global parameters: 

:- typing nrev(Rl,R2). 

:- type Rl -> [] | [X|R1] . 

:- type R2 -> [] | [X|R2] . 



Fig. 1. Parametric vs parameterized regular types 



which are conservative approximations of the meaning of predicates, and hence, over- 
approximations of the success set (in contrast to the approach of inferring well-typings, 
as in, e.g., 131211 . which may differ from the actual success set of the program). Type 
inference via set constraint solving was already proposed in 1121151 . However, most ex- 
isting algorithms 11 1 31 1 41911 are either too complicated or lacking in precision for some 
classes of programs. We try to alleviate these problems by, first, generating simple equa- 
tions; second, using a comparatively straightforward procedure for solving them; and, 
third, using a non-standard operation during solving that improves precision by "guess- 
ing" values. At the same time, we attack a more ambitious objective since our resulting 
types, that we call parameterized, are more expressive than in previous proposals. 

Consider for instance the append/ 3 program in Fig. Q]a, and assume the type de- 
scriptors Al, A2, and A3 for each predicate argument. Most state-of-the-art analyses will 
simply infer that the first argument (type Al) is a list, leaving open the types of the other 
two arguments. In Fig.[T]b we show classical parametric types for append/ 3. We are 
unaware of an existing proposal that is able to infer them as an approximation to the 
success set of the predicate (see Sec. 15). Even if we had an analysis that inferred them, 
they are still less expressive (or need more elaboration for the same expressiveness) 
than our proposal, as we will discuss. 

The parametric types in Fig. [TJb denote the expected list type for Al which is para- 
metric on some X. Note that A2 is unbound since it may be instantiated to any term. 
On the other hand, A3 is an open-ended list of elements of some type Y whose tail is of 
some type Z. A key observation is that while there is a clear relation between the type of 
the elements of Al and A3, and between the type of the tails in A2 and A3, these relations 
are not captured. In Fig.Q]c, we show a desirable, more accurate type for append/ 3. It 
denotes that the type of A3 is that of open-ended lists of elements of the same type as 
Al with tail of the same type as A2. 

The relations between the types of arguments can be captured with parametric types 
only if type parameters are instantiated. For example, the first typing in Fig.[TJd captures 
the desired relations for append/ 3. Instead, by using global type parameters, as we 
propose, the parametric type definitions may exhibit the required exact relations right 
from the inference of the type definitions alone. Although the absence of typings to 
express the same relations is a small advantage, a bigger one might be expected with 
regard to the analysis. By using parametric types, an analysis should not only infer type 



definitions but also typings showing the exact values for the type parameters. It is not 
clear at all how this might be done (note that typings are not the types of calls). Our 
proposal infers type definitions alone, as usual, and yet are at least as expressive as 
parametric types with typings. 

Consider a program construct where two different predicates share a variable, so 
that the corresponding arguments have the same type. This happens, for example, with 
the arguments of nrev/2 and append/ 3 in the classical naive reverse program (see 
Ex. Q~|i. This property cannot be captured with (standard) typings for the predicates if 
parametric type definitions are used. One would need something like the second typing 
in Fig.Q]d. However, this is not usually a valid typing (it types several predicate atoms 
at the same time), and it is also not intuitive how it could be inferred. In contrast, the 
parameterized type definition of Fig. Q]e, together with those of (c), easily capture the 
property, by sharing the type variable global parameter X. Therefore, more precise types 
(called parameterized regular types |fl9l ) can be produced if the scope of each type 
variable in a type definition is broader than the definition itself, so that the types of 
different arguments can be related. 

As discussed in detail in Sec. [5] we believe that no previous proposal exists for 
inferring types for logic programs with the expressive power of parameterized regular 
types. In addition, our proposal fits naturally in the set constraint-based approach. In 
this context we also define a number of enhancements to the solving procedure. The 
result is a simple but powerful type inference analysis. Our preliminary experiments 
also show that our analysis runs in a reasonable amount of time. 

2 Preliminaries 

Let V be a non-terminal which ranges over (set) variables %> , and let / range over func- 
tors (constructors) J of given arity n > 0. Set expressions, set expressions in regular 
form, and in parameterized form are given by E in the following grammars: 

Expressions: E::=® \ V \f(E u ...,E„) \E l UE 2 \ E x C\E 2 

Regular form: E::=d\N N ::= V j f(V\,. . . ,V n ) \ N\ UN 2 

Parameterized form: E ::= j N N ::= N\ UN 2 \R R ::= V \ f(V\ , . . . ,V n ) | V DR 

The meaning of a set expression is a set of (ground, finite) terms, and is given by the 
following semantic function p under an assignment c from variables to sets of terms: 

,u(0,a) = fi(V,a) = a(V) 

^iU£ 2 ,o)= J u(£i,o)U/i(£ 2 ,a) n{E x nE 2 ,a) = n{E u a) nfi(E 2 ,o) 
fi(f(E 1 ,...,E n ),o) = {f(t l ,...,t„)\t i €p.(E i ,a)} 

Let Ei and E 2 be two set expressions, then a set equation (or equation, for short) 
is of the form E\ = E 2 . A set equation system is a set of set equations. A solution a 
of a system of equations S is an assignment that maps variables to sets of terms which 
satisfies: p(ei,a) = p(e 2 ,o) for all (e\ = e 2 ) G S. We will write S h e\ = e 2 iff every 
solution of S is a solution of e\ = e 2 . A set equation system is in top-level form if in all 
expressions of the form f(x\ ,x n ), all the Xj are variables. The top-level variables of 
a set expression are the variables which occur outside the scope of a constructor. 



A standard set equation system is one in which all equations are of the form V = E 
(i.e., lhs are variables and rhs set expressions) and there are no two equations with the 
same lhs. Equations where the rhs is also a variable will be called aliases. In a standard 
set equation system variables which are not in the lhs of any equation are called free 
variables (since they are not constrained to any particular value). Variables which do 
appear in lhs are called non-free. 

A regular set equation system is one which is standard, all rhs are in regular form, 
has no top-level variables except for aliases, and also no free variables. A regular set 
equation system is in direct syntactic correspondence with a set of regular type defini- 
tions. Regular types are equivalent to regular term grammars where the type definitions 
are the grammar rules. In a regular set equation system the set variables act as the type 
symbols, and each equation of the form x = e\ U . . . U e n acts as n grammar rules of the 
form x ::= ei, 1 < j < n. By generalizing regular set equation systems to allow free 
variables what we obtain is the possibility of having parameters within the regular type 
definitions. However, when free variables are allowed intersection has to be allowed 
too: given that free variables are not constrained to any particular value, intersections 
cannot be "computed out." 

A leaf-linear set equation system |[T9l is one which is standard, all rhs are in pa- 
rameterized form, and all top-level variables are free. Note that leaf-linear set equation 
systems are the minimal extension of regular equation systems in the above mentioned 
direction, in the sense that intersections are reduced to a minimal expression: several 
free variables and only one (if any) constructor expression. More importantly, a leaf- 
linear set equation system is more expressive than parametric regular types, since pa- 
rameters have the scope of the whole system, instead of the particular type definition in 
which they occur, as in parametric type definitions. 

3 Type inference 

In this section, we present the different components of our analysis method for inferring 
parameterized regular types. First, a set equation system is derived from the syntax of 
the program, then, the system is solved, and finally it is projected onto the program vari- 
ables providing the type definitions for such variables. The resulting equation system 
is in solved form, i.e., it is leaf-linear. Such a system will be considered a fair rep- 
resentation of the solution to the original set equation system, since it denotes a set of 
parameterized regular types. When the system is reduced to solved form, the parameters 
of such a solution are the free variables. 

Generating a set equation system for a program. Let P be a program, lip the set of 
predicate symbols in P, ranked by their arity, and P\ p the set of rules defining predicate 
p in program P. Our analysis assumes that all rules in P have been renamed apart so 
that they do not have variables in common. For each p G Yip of arity n we associate a 
signature of p defined as T.(p) = p(x\ ,x„) where {xi ,x„} is an ordered set of n 
new variables, one for each argument of p. For atom A, let [A]j denote its j-th argument. 
For a predicate p G lip with arity n we define C p and the initial set equation system, E, 
for P as follows. In order to avoid overloading symbol U, to clarify the presentation, we 



will use u for the usual set union, while U will stand for the symbol occurring in set 
expressions. 

E = \j{Cp \p£Tlp} C p = C H eacl U C B ody (1) 

C He ad = { xj = \J{{H]j I [H-.-B) G P\ p } I xj G vars(L(p)) } 

C B ody = { y = n{P(A)], I [A], = y, A G B} \ {H:-B) G P\ p , y G vars(B) } 

U { y = P(A)],- Hf | [A],- = r, A G B, (H: -B) G P\ p , t vars(B), y fresh var } 

Example!. Take signatures append (A1,A2, A3) and nrev (Nl, N2 ) in the following 
program for naive reverse. The equation system for nrev/ 2 is C nrev /2 = Ch U Cb- 

nrev( [],[]). 

nrev( [X|Xs] ,Ys) :- nrev(Xs,Zs), append (Zs, [X] , Ys) . 

Ch = {N1 = []U[X|Xs],N2 = []UYs} 

C B = { Xs = Nl, Ys = A3, Zs = N2 DAI, W = [X] HA2 } 

Note that a system E which results from Eq. [T] is in standard form. Moreover, to 
put it also in top-level form we only need to repeatedly rewrite every subexpression of 
every equation of E of the form f[e\ , . . . ,ey, . . . ,e n ) into f(e\ , . . . ,yj, . . . ,e„), whenever 
ej is not a variable, adding to E equation y 7 = ej, with yj a new fresh variable, until no 
further rewriting is possible. The new equations added to E are in turn also rewritten in 
the same process. We call the resulting system Eq(P). Obviously, Eq(P) is equivalent 
to E in Eq. [T] 

Analysis of the program. In order to analyze a program and infer its types, we follow 
the call graph of the program bottom-up, as explained in the following. First, the call 
graph of the program is built and its strongly connected components analyzed. Nodes 
in the same component are replaced by a single node, which corresponds to the set 
of predicates in the original nodes. The (incoming or outgoing) edges in the original 
nodes are now edges of the new node. The new graph is partitioned into levels. The 
first level consists of the nodes which do not have outgoing edges. Each successive 
level consists of nodes which have outgoing edges only to nodes of lower levels. The 
analysis procedure processes each level in turn, starting from the first level. Predicates 
in the level being processed can be analyzed one at a time or all at once. 

Each graph level is a subprogram of the original program. To analyze a level, equa- 
tions are set up for its predicates as by Eq.Q] and copies of the solutions already obtained 
for the predicates in lower levels are added. To do this, the signatures and solutions of 
the predicates in lower levels are renamed apart. The new signatures replace the old 
ones when used in building the equations for the subprogram. The new copies of the 
solutions are added to the set equation system. For a given predicate, there is a different 
copy for each atom of that predicate which occurs in the subprogram being analyzed. 

Example 2. Consider the following (over-complicated) contrived predicate to declare 
two lists identical, which calls predicate nrev/ 2 of Ex.[T]twice. 

same (LI, L2) : - nrev(Ll,L), nrev(L,L2). 



Predicate nrev/2, which is in a lower sec, would have been analyzed first. Taking sig- 
natures same(Sl,S2) and nrev (N1,N2) , the solution for nrev/2 would be: 

{ Nl = [] U [X | Nl] , N2 = [] U [X | N2] } 

Since there are two calls to this predicate in the above definition for same/2, two copies 
of the above equations (renaming the above equations for (N1,N2) into (N11,N12) and 
(N21,N22)) would be added. The initial equation system for same/2 is thus: 

{ SI = LI, S2 = L2, LI = Nil, L2 = N22, L = N12HN21, 
Nil = [] U [XI | Nil] , N21 = [] U [X2 |N21] , 
N12 = [] U [XI | N12 ] , N22 = [] U [X2 |N22] } 

Note that the two arguments of same/ 2 have, in principle, list elements of different type 
(XI and X2). That they are in fact of the same type will be recovered during the solving 
of the equations, in particular, the equation for L, the rhs of which is an intersection 
(unification). This will be done by an operation we propose (BIND) which will make 
an appropriate "guess", as explained later. 

Note that the copies are required because the type variables involved might get new 
constraints during analysis of the subprogram. Such constraints are valid only for the 
particular atom related to the given copy. Also, as the example above shows, using a 
single copy would impose constraints, because of the sharing of the same parameter 
between equations referring to the same and single copy (as it would have been the case 
with X in the example, had we not copied the solution for nrev/2). Such constraints 
might be, in principle, not true (though in the example it is finally true that both lists 
have elements of the same type). Thus, copies represent the types of the program atoms 
occurring at the different program points (i.e., different call patterns). 

Note also that equations with intersection in the expression in the rhs will be used 
to capture unification, as it is the case also in the example. This occurs sometimes 
by simplification of intersection (procedure SIMP below), but most of the times from 
operation BIND, that mimics unification. Such equations are particularly useful. This 
is also true even if the equation is such that its lhs is a variable which does not occur 
anywhere else in the program (as it is the case of W in Ex.Q]). Additionally, in these cases, 
if the rhs finally becomes the empty set, denoting a failure, this needs to be propagated 
explicitly, since the lhs is a variable not related to the rest of the equations. 

The details of the solving procedure are explained below. The procedure is based 
on the usual method of treating one equation at a time, the lhs of which is a variable, 
and replacing every (top-level) occurrence of the variable by the expression in the rhs 
everywhere else. Once the system is set up as explained, it is first normalized and then 
solved as described in the following. 

Normal form. The normal form used is Disjunctive Normal Form (DNF), plus some 
simplifications based on equality axioms for n, U, and 0, which transform expressions 
into parameterized form. Once the system is in top-level form, two auxiliary algorithms 
are used to rewrite equations to achieve normal form. These algorithms are based on 
semantic equivalences, and therefore preserve solutions. 



The first algorithm, DNF, puts set expressions in a set equation in disjunctive normal 
form. Note that if all expressions of an equation q are in top-level form then those of 
equation DNF{q) are in parameterized form, except for nested occurrences of and 
possibly several occurrences of constructor expressions in conjuncts. This is taken care 
of in the second algorithm, SIMP. 

SIMP simplifies set expressions in an equation system E by repeatedly rewriting 
every subexpression based on several equivalences until no further rewriting is possible, 
as follows: 

1. efl0~>0 INTER. SIMPLIFICATION 

2. ene~~>e INTER. ABSORPTION 

3. eU0-^e UNION SIMPLIFICATION 

4. eUe^e UNION ABSORPTION 

5. e\ U (ei He2) e\ SUBSUMPTION 

6. f(e u ...,e in )ng(di,...,d n )^& if/^gorn^m CLASH 

7. f(ei,...,e n )r\f(d u ...,d n )~~> f(y u ...,y n ) INTER. DISTRIBUTION 
il' v./.l • ./ "-.v, <■; d, 

8. f(yi,---,y n ) if 3z'.l < i < n.E hy, = emptiness 

Note that line [8] makes use of the check E h y, ; = 0. For this test a straightforward 
adaptation of the type emptiness test of Q to parameterized definitions is used. Note 
also that the rewriting in line [7] deserves some explanation. If the set equation system 
E does not contain an equation yj = ejDdj for some j then the equation is added to 
E, with yj a new fresh variable. Otherwise, yj from the existing equation is used. If 
equations were not added, it would prevent full normalization of the set expressions. If 
variables were not reused, but instead new variables added each time, it would prevent 
the global algorithm from terminating. 



Solving recurrence equations. Since the set equation system is kept in standard form 
at all times, the number of different equations that need to be considered is very small. 
The procedure CASE reduces a given equation into a simpler one. It basically takes 
care only of recurrences. We call recurrence an equations = e where x occurs top-level 
in e. There are four cases of recurrences that are dealt with: 

1. X = X^> X = 

2. i=inef>.iCe'v>j = 

4. x = (jcflei) Ue2 <-> e2 C x C e\ ~> x = e2 

Note that in all cases we have chosen for variable x the least solution of all possible 
ones allowed by the corresponding recurrence. We are thus taking a minimal sufficient 
solution, in the sense that the resulting set of terms would be the smallest possible one 
that still approximates the program types. This is possible because we keep equations 
in standard form, so that their lhs is always a variable and there is only one equation per 
variable. If the equation turns into a recurrence, then the program contains a recursion 
which does not produce solutions (e.g., an infinite failure). 



Example 3. Consider the following program with signatures p (P ) , q (Q) , and r (R) : 



Algorithm 1 SOLVE 



Input: a set equation system E in standard form. 
Output: a set equation system 5 in solved form. 



1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 



initialize 5 
repeat 

initialize C 
repeat 

subtract q from E 

(4,C) 4r- SIMP(DNF( 9 ),C,S) 

q" <- CASE(q',S) 

let q" be of the form x = e 

replace every top-level occurrence of x in E and in S by e 
add to 5 
until E is empty 
repeat 
for all q € S do 

(q,C) <- SIMP(DNF( 9 ),C,S) 

if g is of the form i = e,e^0, and 5 h x = then 

replace ^ by x = in S 
end if 
end for 

until no equation g is replaced 
for all (x = 0) e S do 

if x occurs in Eq(P) for some clause c then 
for all variable y in the head of c do 

replace the equation in S with lhs y by y = 
end for 
end if 
end for 

for all (x = e) £ S do 

replace every top-level occurrence of x in C by e 
end for 
add C to E 
if £ is empty then 

E +- BIND(S) 
end if 
until £ is empty 



P(X) :- p(X) . q(a) . r (b) . 

q(Y) :- q(Y) . r (Z) :- q(Z) , r(Z) . 

It is easy to see that its initial set equation system will progress towards the following 
form: { P = P, Q = aUQ, R = bU(QHR) }. 

The global algorithm. The global algorithm for solving a set equation system is given 
as Algorithm SOLVE. In order to facilitate the presentation, the set equation system is 
partitioned into three: E, C, and S. E is the initial equation system (generated from the 
program as in Eq. [TJ, S is the solved system (i.e., the output of the analysis), and C is 



an auxiliary system with the same form as E. This forces the use of SIMP in lines [6] 
and[l4]with two input equation systems instead of one. SIMP may add new equations, 
and these are added to C. Also, in SIMP the test in line^CE h yj = 0) is carried along 
in S (i.e., it should be read as S h v/ = when called from SOLVE). 

Once more, we use the test S h x = (line[l5]of Algorithm SOLVE). This is related 
to the other use of this test in SIMP. Note that SIMP is invoked again at line [14] of 
Algorithm SOLVE. Together, the loop in lines [T2lfT9l of Algorithm SOLVE and line [8] 
of SIMP perform a complete type emptiness propagation. Because of this, the effect of 
the rest of cases of SIMP, which propagate symbol in subexpressions, is to achieve 
an implicit check for non-emptiness of intersections. The rationale behind the loop in 
lines I20ll26l is different. It solves a lack of propagation of failure for some programs, 
due to the form of the initial equation system. 

Example 4. In the program below, failure of p/1 would not be detected unless step [23] 
is performed, even if failure of q/2 is detected because of type emptiness. 

p(X) :- q(b,X) . q(a,a) . 

Finally, the BIND procedure invoked at line[32]of Algorithm SOLVE is not needed 
for obtaining the solved equation system, since S is already in solved form when BIND 
is called, but it may improve the precision of the analysis. 

Example 5 (SOLVE with binding of free variables). We show the analysis of a contrived 
example which exposes the strength of our approach in propagating types through the 
type variables that act as parameters. We consider signature append (A1,A2, A3) al- 
ready solved with solution: Al = [] U [X|A1], A3 = A2 U [X| A3]; and concentrate 
on the analysis of: 

appself (A, B) : - append (A, [], B) . 

We show the contents of E at the beginning of Algorithm SOLVE and of S at linel32l(C 
and E are empty). We only show the equations for the program variables of appself 12. 
The equations missing are those for the solution of append/ 3 -note that A2 is free- and 
aliases of variables A and B for the arguments of the signature of appself 12: 

E={ A=A1, B=A3, W=A2 n [ ] } 

S={ A=[] U [X|A1], B=A2 U [X| A3], W=A2 f) [] } 

Now, BIND can give a more precise value to the only free variable that appears in an 
intersection, A2 (A2 = [ ] ). We now include the equation for A3, which is relevant: 

E={A2=[]} 

S={A=[] U [X|A1], B=A2 U [X| A3], W=A2 H [], A3=A2 U [X|A3]} 

After a second iteration of the main loop (lines 1214341) we have E empty again, and: 

S={A=[] U [X|A1], B=[] U [X|A3], W=[], A3=[] U [X|A3] } 

Finally, after projection on the variables of interest (which are in fact those of the sig- 
nature of appself 12, but we use its program variables A,B for clarity instead): 

S={A=[] U [X|A], B=[] U [X |B] } 



Binding free variables to values. Since the equation system is already solved, i.e., 
it is in solved form, when BIND is applied, free variables can at this moment take any 
value. BIND will then mimic unification by binding free variables to the minimal values 
required so that the expressions involving them have a solution. In order to do this, all 
subexpressions with the form of an intersection (i.e., unification) with a free variable 
are considered. These are called conjuncts. 

Since the set expressions are in parameterized form, conjuncts are all of the form 
e\De2, where e\ is a, possibly singleton, conjunction of free variables, and a, possibly 
non-existing, constructor expression. Let e\ be of the form x\ n . . . Dx n , n > 1, and e 
be e2 if it exists or a new variable, otherwise. A set of candidates is proposed of the 
form {xi = e \ 1 < i < «}. For an equation q, let Candidates(q) denote the set of sets 
of candidates for each conjunct in q. For defining procedure BIND we need to consider 
the following relation between equations. Let q^r q' when equation q is the equation 
taken by SIMP at a given time, and q' the new equation added at step Q of Algorithm 
SIMP from q. Let h->-* denote the transitive closure of i-h BIND constructs the formula 
below, and synthesizes from it a suitable set of set equations. 

A V V A* (2) 

qeEq(P) c/H-V ceCandidates(q') eec 

Example 6. Let the following contrived program for alternate/2, which resembles 
the way the classical program for the towers of Hanoi problem alternates the arguments 
representing the pegs that hold the disks across recursions: 

alternate (A1,B1) . 

alternate (A2,B2) :- alternate (B2,A2) . 

It is easy to see that, for a signature alternate (PI, P2 ) , the solution will be the system 
of the following two equations: 

PI = Al U Bl, P2 = Al U Bl 

The analysis of a program containing an atom of the form alternate (a,b) in a 
clause body, will be faced with the equations: 

Wl = (AlUBl)na, W2 = (AlUBl)nb, 

which at line[32]of Algorithm SOLVE would be solved as: 

wi = (Aina)u(Bina), w2 = (Ainb)u(Binb), 

so that BIND will find two original equations (in Eq(P); not related by n>*) with two 
candidates each ({{Al=a}, {Bl=a}} for the equation for Wl and {{Al=b}, {Bl=b}} for 
the equation for W2). Thus, BIND obtains the formula: 

(Al = a V Bl = a ) A ( Al = b V Bl = b ), which is equivalent to: 

( Al = a A Bl = b ) V ( Al = b A Bl = a), and results in the following two 

set equations, which "cover" all solutions of the above formula: 

Al = a U b, Bl = a U b, 



Benchmark 


# Desc. 


NFTA 


PTA 


Bad call type 


Prec. 


Time (s) 


Error 


Prec. 


Time (s) 


Error 


append 


3 


1 


0.000 


n 


3 


0.000 


y 


append (A, a, A) 


blanchet 


1 


1 


0.052 


y 


1 


0.032 


y 


attacker (s) 


dnf 


2 


2 


1.156 


n 


2 


1.892 


n 


dnf (X,a(zl, o(z2,z3))) 


fib 


2 





0.000 


n 


2 


0.000 


y 


fib(a,X) 


grammar 


1 





0.016 


y 


1 


0.124 


y 


parse ( [boxes, fly] , S) 


hanoi 


5 


1 


0.020 


n 


5 


0.028 


n 


hanoi (5, a, b, c, [mv (e, f ) ] ) 


mmatrix 


3 


3 


0.008 


n 


3 


0.008 


y 


mmult([l,2], [[1,2], [3,4]],X) 


mv 


3 


1 


0.020 


n 


3 


0.036 


y 


mv([l,3,l], [b,c,a],X) 


pvgabriel 


2 





0.184 


n 


2 


0.124 


n 


pv.init ( [1,2] ,X) 


pvqueen 


2 


1 


0.020 


n 


2 


0.028 


y 


queens (4, [a, b, c, d] ) 


revapp 


2 


2 


0.004 


n 


2 


0.004 


y 


rev ( [1,2] , [a,b] ) 


serialize 


2 


2 


0.044 


n 


2 


0.164 


y 


serialize)' 1 hello'', [a,b,c]) 


zebra 


7 


1 


0.080 


n 


7 


0.372 


y 


zebra (E, S, J, U, second, Z, W) 


Total 


35 


13 


1.604 


2 


35 


2.812 


10 





Table 1. Experimental results for NFTA and parameterized type analysis (PTA) 



and (imply expressions for PI and P2 which) correctly approximate the success types 
of alternate/2 when called as in alternate (a, b) . 

In the synthesis of set equations from the formula in Eq. [2] V turns into U and A 
turns into n. The details of how to do this in a complete synthesis procedure are the 
subject of current work. Our implementation currently deals with the simpler cases, 
including those similar to the example above and those in which all candidates involve 
only one free variable (as in Ex. [5]). 

4 Preliminary experimental evaluation 

In order to study the practicality of our method we have implemented a prototype ana- 
lyzer in Ciao (http : / /www. ciaohome . org J4j) and processed a representative set of 
benchmarks taken mostly from the PLAI (the Ciao program analyzer) and GAIA |22l 
sets. We chose the Non-deterministic Finite Tree Automaton (NFTA)-based analysis [9| 
for comparison. We believe that this is a fair comparison since it also over-approximates 
the success set of a program in a bottom-up fashion (inferring regular types). Its imple- 
mentation is publicly available (http : //saft . rue . dk/ Tattoo/ index .php) and it is 
also written in Ciao. We decided not to compare herein with top-down, widening-based 
type analyses (e.g. [16 22 23 1), since it is not clear how they relate to our method. We 
leave this comparison as interesting future work. 

The results are shown in Table Q] The fourth and seventh columns show analysis 
times in seconds for both analyses on an Intel Core Duo 1.33GHz CPU, 1GB RAM, 
and Ubuntu 8.10 Linux OS. The time for reading the program and generating the con- 
straints is omitted because it is always negligible compared to the analysis time. Col- 
umn # Desc. shows the number of type descriptors (i.e., predicate argument positions) 



which will be considered for the accuracy test (all type descriptors are considered for 
the timing results). For simplicity, we report only on the argument positions belong- 
ing to the main predicate. Note that execution of these predicates often reaches all the 
predicates in the program and thus the accuracy of the types for those positions is often 
a good summary of precision for the whole program. Columns labeled Prec. show the 
number of type descriptions inferred by NFTA and our approach that are different from 
type any. To test precision further, we have added to each benchmark a query that fails, 
which is shown in the last column (Bad call type). Columns labeled Error show whether 
this is captured by the analysis or not. 

Regarding accuracy, our experiments show that parameterized types allow inferring 
type descriptors with significant better precision. Our approach inferred type descrip- 
tions different from any for every one of the 35 argument positions considered, while 
NFTA inferred only 13. Moreover, those type descriptors were accurate enough to cap- 
ture type emptiness (i.e., failure) in 10 over 13 cases, whereas NFTA could only catch 
two errors. The cases of dnf, hanoi, and pvgabriel deserve special attention, since our 
method could not capture emptiness for them. For dnf the types inferred are not pre- 
cise enough to capture that the second argument is not in disjunctive normal form. This 
is due to the lack of inter-variable dependencies in our set-based approach. However, 
emptiness is easily captured for other calls such as dnf (X,b) . A similar reasoning 
holds for hanoi since the analysis infers that the types of the pegs are the union of a, b, 
c, e, and f . Finally, the success of pvgabriel depends on a run-time condition, and thus 
no static analysis can catch the possible error. 

Regarding efficiency, we expected our analysis to be slower than other less expres- 
sive methods (like NFTA), since our method is more expressive than previous proposals. 
Table Q] shows that this is indeed the case, but the differences are reasonable for the se- 
lected set of benchmarks, specially considering the improvement in accuracy. Further 
research is of course needed using larger programs (see Sect. |6]l, but we find the results 
clearly encouraging. 

5 Related work 

Type inference, i.e., inferring type definitions from a program, has received a lot of 
attention in logic programming. Mishra and Reddy II18I19I propose the inference of 
ground regular trees that represent types, and compute an upper approximation of the 
success set of a program. The types inferred are monomorphic. Zobel l24l also proposes 
a type inference method for a program, where the type of a logic program is defined 
as a recursive (regular) superset of its logical consequences. However, the inference 
procedure does not derive truly polymorphic types: the type variables are just names 
for types that are defined by particular type rules. 

In IfTTll an inference method called type reconstruction is defined, which derives 
types for the predicates and variables of a program. The types are polymorphic but they 
are fixed in advance. On the contrary, our inference method constructs type definitions 
during the analysis. In J8) the idea is that the least set-based model of a logic program 
can be seen as the exact Herbrand model of an approximate logic program. This approx- 
imate program is a regular unary program, which has a specific syntactic form which 



limits its computational power. For example, type parameters cannot be expressed. The 
work of 11 11201 also builds regular unary programs, and their type inference derives 
such programs. However, the types remain monomorphic. 

Heintze and Jaffar lfT2l defined an elegant method for semantic approximation, 
which was the origin of set-based analysis of logic programs. The semantics com- 
putes, using set substitutions, a finite representation of a model of the program that 
is an approximation of its least model. Unfortunately, the algorithm of Ifl2l was rather 
complicated and practical aspects were not addressed. Such practical aspects were ad- 
dressed instead in subsequent work II 141 1311 51 . simplifying the process. However, our 
equations are still simpler: for example, we do not make use of projections, i.e., expres- 
sions of the form f'(x), where / is a constructor, the meaning of, e.g., y = f l (x) being 
that x D f(y, ...). Also, and more importantly, these approaches do not take advantage 
of parameters, and the types obtained are again always monomorphic. 

Type inference has also been approached using the technique of abstract interpreta- 
tion. A fixpoint is computed, in most cases with the aid of a widening operation to limit 
the growth of the type domain. This technique is also used in 1111 . However, none of 
the analyses proposed to date (e.g., 116122123 1) is parametric. It is not clear how the ap- 
proach based on abstract domains with widening compares to the set constraints-based 
approach. An interesting avenue for future work, however, is to define our analysis as 
a fixpoint. This is possible in general due to a result by Cousot and Cousot [6| which 
shows that rewriting systems like ours can be defined alternatively in terms of a fixpoint 
computation. 

Directional types 11151 are based on viewing a predicate as a procedure that maps 
a call type to a success type. This captures some dependencies between arguments, but 
they are restricted to monomorphic types. It would be interesting to build this idea into 
our approach. We believe that some imprecision found in analyzing recursive calls with 
input arguments could be alleviated by using equations which captured both calls and 
successes. The relation of these with directional types might be worth investigating. 

Bruynooghe, Gallagher, Humbeeck, and Schrijvers 131211 develop analyses for type 
inference which are also based on the set constraints approach. However, their analyses 
infer well-typings, so that the result is not an approximation of the program success set, 
as in our case. In consequence, their algorithm is simpler, mainly because they do not 
need to deal with intersection. In ETl the monomorphic analysis of (5) is extended to the 
polymorphic case, using types with an expressiveness comparable to our parameterized 
types. 

The set constraints approach to type analysis is also taken in NFTA |9|, which is 
probably the closest to ours. However, the types inferred in this analysis, which are ap- 
proximations of the program success set, are not parametric and as a result the analysis 
is less precise than ours, as shown in our experiments. The work of J2] aims at inferring 
parametric types that approximate the success set. Although independently developed, 
it includes some of the ideas we propose. However, the authors do not use the set-based 
approach, and their inference algorithm is rather complex. The latter made the authors 
abandon that line of work and the approach was not developed, switching instead to 
well-typings. 



6 Concluding remarks and future work 



To conclude, using the set constraints approach we have proposed a simple type infer- 
ence based on the rewriting of equations which, despite its simplicity, allows enhanced 
expressiveness. This is achieved by using type variables with a global scope as true pa- 
rameters of the equations. The improved expressiveness allows for better precision, at 
an additional cost in efficiency, which is, nevertheless, not high. 

During our tests, we have identified potential bottlenecks in our analysis which may 
appear in large programs and which are worth investigating in future research. Some 
practical issues were already addressed by Heintze in 03), that we have not considered, 
concentrating instead in this paper on the precision and soundness of our approach. In 
particular, it is well-known that a naive DNF expansion may make the size of an expres- 
sion grow exponentially. However, we use simple rules for minimizing the number of 
expressions (such as computing only intersections that will survive after simplification) 
which work quite well in practice. Even for larger programs we expect these rules to 
be effective, based on the experience of lfl4l . An efficient method for storing the new 
intersections generated in line|7]in SIMP is also needed for the scalability of the system. 
We use the same simple technique as lfl4l . Note that the size of the table is in the worst 
case 2 N where N is the number of original variables in the program. However, the size 
of the table actually grows almost linearly in our experiments (we omitted these results 
due to space limitations). Even if the size of the table were to grow faster, we believe 
that we can mitigate this effect with similar techniques (e.g., use of BDDs) to 1191 101 , 
because of the high level of redundancy. We have also observed that the replacement 
performed in lines [9] and |28]of SOLVE may be expensive for large programs. We think 
that there are many opportunities for reducing this limitation (e.g., dependency directed 
updating lfl4l ). 
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